Demographic modification of gene expression in predicting diffuse large B cell lymphoma (DLBCL) treatment response

MSt Healthcare Data Science

Authentication of practice

I confirm that I have fully read and understood the assignment brief for this module. Y

Details

Name: Chung Yan Surname: Yu
Submission Date 19-10-25
Word Count: whole assignment including codes
Word Count: main body excluding abstract, references and supplementary materials

Permission to share your assignment

I do give permission to share my assignment with future MSt participants.

University statement of originality

This assignment is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the Preface and specified in the text. It is not substantially the same as any that I have previously submitted for a degree or diploma or other qualification at the University of Cambridge or any other university or similar institution, or that is being concurrently submitted, except as declared in the Preface and specific in the text. I further state that no substantial part of my Portfolio has already been submitted, nor is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other university or similar institution except as declared in the Preface and specified in the text.

I confirm the statement of originality as above Y

Questions for reflection

Self-assessment is an important aspect of feedback literacy, which is, in turn, key to the development of expertise. As you proceed through the MSt Healthcare data science programme, we hope that you will make use of the following prompts to assess your own work on assignments. Specific assignment briefs will likely indicate which of these to address for which assessments, but, in general, we expect you to respond to one or two for each assignment on your course.

For each of the questions, do not spend too long answering – keep it brief. For each question you answer, limit yourself to no more than three items. And please remember, this is optional and developmental: these cover sheets are designed to create space for self-assessment and feedback dialogue, rather than additional assignment workload.

  1. Which aspects of this assignment are you most uncertain about and/or would most like to receive feedback on?
  2. What elements are you left pondering after this assignment that you would like to discuss further?
  3. How have you incorporated feedback from peers and tutors into this assignment?
  4. How, and to what extent, have you been able to incorporate feedback on previous course work into this assignment?
  5. Using the wording in the rubric, how would you describe the quality of the different aspects of your work?

Declaration of the use of generative AI

Which permitted use of generative AI are you acknowledging? Semantic search of literature and notes, outline creation, code debugging, output formatting and generation, informed feedback
Which generative AI tool did you use (name and version)? Claude Code v2.0.3x (the version likely changed over usage), Claude Desktop v1.0.3x with PubMed Connector, M365 CoPilot
What did you use the tool for? Searching for relevant journal publications, sanity check on coding strategies, collating notes in my Obsidian vault, debugging code by parsing error messages
How have you used or changed the generative AI’s output E.g. my collated notes always stay in point form so that I write out the paragraphs in my own words. Feedback provided by the GenAI models are weighed and assessed before being acted on. Code generated are checked, e.g. it tried to run Levene test on a model fitted using residuals as the response, this was rectified back to the relevant response variable.

Abstract

Diffuse large B cell lymphoma (DLBCL) is the most common non-Hodgkin lymphoma, with prognosis influenced by demographic features (age, gender), integrated scoring systems such as the International Prognostic Index (IPI) and genetic alterations. Oncogenes MYC, BCL2, BCL6 emerged with strong DLBCL associations, and further studies have identified over 150 genetic drivers. Yet whether these genetic markers’ effects differ across demographic subgroups remains understudied. This study analysed the gene expression data of 21 genes from 775 DLBCL patients to investigate whether age and gender modify gene expression-response associations. Hierarchical logistic regression models tested demographic-gene interactions across 21 genes and 2 demographic factors (age and gender, totalling 42). Only CCND2 showed significant interaction effects (Likelihood ratio test adjusted p = 0.006) with age on predicting complete response. Older age groups showed significant increase (61-75 odds ratio p = 0.014; >75 odds ratio p = 0.001) in complete response probability with higher CCND2 expression, while younger patients (≤40, 41-60) showed the opposite insignificant trend. Gender revealed no main or interaction effects across all genes. MYC showed significant prognostic effects (likelihood ratio test adjusted p = 0.010) with response completeness that were consistent across all demographic groups, with no interaction effects. With only 1 of 42 tested demographic-gene combinations demonstrating interaction effects, the demographic modifying effects remain a rare phenomenon amongst genetic markers examined. This suggests demographic stratification may generally be unnecessary for genetic markers. The age-dependent effects of CCND2 could however provide a starting point to reconcile conflicting prognostic associations of CCND2 expression levels in existing literature, although further validation in other independent cohorts and mechanistic confirmation are required.

Introduction

Diffuse large B cell lymphoma (DLBCL) is the most common type of aggressive non-Hodgkin’s lymphoma (NHL)1,2, with a wide range of factors contributing to prognosis. For a comparable and comprehensive pre-treatment prognosis model, the international prognostic index (IPI) emerged as a standardised model for DLBCL and other aggressive NHLs3, combining 5 factors: age, Ann Arbor stage classification, ECOG Performance Status, serum lactate dehydrogenase (LDH) level and the number of extranodal sites of disease. By pooling factors together, the IPI outperformed simple disease stage classification models such as Ann Arbor.4 went on to improve on the IPI, creating the National Comprehensive Cancer Network International Prognostic Index (NCCN-IPI) as an update to the IPI. A key innovation of the NCCN-IPI was using cubic splines to model the continuous factors of age and LDH levels as higher resolution categorical factors, modifying them from binary factors to quaternary and ternary factors respectively. As prognosis models improved, so did treatment. The first generation chemotherapy of cyclophosphamide, doxorubicin, vincristine, and prednisone (CHOP) had a cure rate of 30-35%5. With the addition of rituximab to the regimen (R-CHOP) and other advances, DLBCL becomes curable for more than 60% of patients6.

While age is incorporated into the IPI (and subsequent NCCN-IPI), other demographic factors such as gender are not, despite established evidence for their prognostic involvement. Males demonstrate higher hazard ratios7 and gender-associated pharmacokinetic differences in rituximab lead to worse treatment responses in males8. These demographic effects raise questions about whether risk factors perform uniformly across patient subgroups. Beyond demographic and clinical factors, a series of genetic alterations have emerged as strong prognostic markers: BCL29, BCL610 and MYC11. These changes are observed at both the genotypic and phenotypic levels, as translocation mutations and protein overexpression respectively12. Furthermore, patients with varying combinations of these alterations have shown worse responses to R-CHOP treatments, highlighting the clinical relevance of genomic profiling. Some genetic markers have shown age-associated expression patterns, and certain genetic drivers (BCL6, IRF4) lose prognostic significance when age is incorporated in multivariate models13. This indicates that demographic factors may not merely add to genomic risk additively, but could modify how these genomic drivers influence clinical outcomes.

With the advent of machine learning methods came attempts to construct models based on clinical and genomic data.14 performed whole exome and transcriptome sequencing in 1001 DLBCL patients, identifying 150 genetic drivers including BCL2, BCL6, and MYC alongside numerous newly characterised candidates. By combining genotypic alterations, gene expression data and cell-of-origin classification, they developed a genomic risk model that outperformed the IPI. However, the potential for demographic factors to modify these genomic associations remains understudied. Given that age and gender both demonstrate prognostic significance and influence treatment response, it remains unknown whether genomic risk markers perform uniformly across demographic subgroups. If these demographic factors modify how gene expression levels impact treatment response, such that the same gene expression levels in different gender or age groups lead to different prognostic outcomes, it would be crucial that these differences are identified and applied in future models to prevent inaccurate prognoses.

This study addresses this gap by investigating whether age and gender interact and modify the prognostic associations between gene expression markers and therapy response in DLBCL. The associations between the demographic factors (age & gender) with complete therapy response and the gene expression levels of 21 genes were first considered. Logistic regression models using gene expression levels of 21 genes were fitted with therapy response, and the interaction of demographic factors with these gene expression levels were examined. Exploring how demographic factors interact with known and potential genomic drivers of DLBCL could reveal age or gender-dependent tumour biology that inform therapeutic approaches.

The analysis utilises clinical and gene expression data15 from Reddy et al.’s genomic risk model study14 of 1001 DLBCL patients, focusing on 775 patients with complete gene expression profiles for 21 genes including the big three oncogenic drivers (MYC, BCL2, BCL6) and cell-of-origin classifiers. Detailed data breakdown and processing are documented in the Methods section.

Methods

Data Availability & Access

The datasets were publicly available on the ScienceDirect webpage of Reddy et. al’s study14, and the corresponding author was contacted via email to inform them of this secondary analysis. A public GitHub repository was created to host the processed datasets, attribution, license information and supplementary materials to ensure data openness, transparency and reproducibility. For the original study, the authors obtained anonymised patient clinical information and tumours, which were processed in line with a protocol approved by the Institutional Review Board at Duke University. Thus no further anonymisation is required. A total number of 2 datasets were downloaded as Excel files directly from their links on the aforementioned ScienceDirect, Table S1 and Table S2 using the download.file function from R’s utils package16 and read via the readxl package17. The specific sheets were selected then cleaned to be exported as .csv files (see 01-data-preprocessing.R19). The processed .csv files were uploaded to this project’s GitHub repository: mmc1-ClinicalInformation.csv (raw), mmc2-GeneExpression.csv (raw). This RMarkdown document is self-sufficient and contains code for the relevant analysis and plots shown only, for a complete analysis including assumption testing, post-hoc analysis, tests that showed insignificant results, please refer to the sequenced scripts in supplementary-materials folder of this repository. This document also supports sourcing the datasets from 3 sources for redundancy: original ScienceDirect links, processed .csv files on GitHub and local processed .csv files when this repository is downloaded. For this knitted20 HTML document, both datasets were loaded directly from ScienceDirect’s supplementary materials: S1 Table (clinical data) and S2 Table (gene expression data).

Data Cleaning & Selection

The 2 datasets were joined using common patient IDs and 2 new columns were added. age_group_nccn was obtained by transforming the age at diagnosis variable into 4 categorical groups according to cutoffs defined by NCCN-IPI4: ≤40, 41 - 60, 61 - 75, >75. complete_response was added by combining “Partial response” and “No response” as “Incomplete response” from response_therapy while keeping “Complete response” unchanged. This prepared the dataset for hierarchical binomial logistic regression. As for duplicate gene expression columns for BCL6, the two columns were compared to verify identicality and 1 was discarded. The resulting table was one with 775 patients and 27 variables, of which include the anonymised patient ID; demographic variables such as gender (binary: F/M), age (continuous, at diagnosis) and age group (quaternary, as defined by NCCN-IPI); clinical information of therapy response (ternary: Complete response, Partial response, No response); and the log2 transformed gene expression levels of 21 genes: MYC, BCL2, BCL6, ITPKB, MME, MYBL1, DENND3, NEK6, LMO2, LRMP, SH3BP5, IRF4, PIM1, ENTPD1, BLNK, CCND2, ETV6, FUT8, BMF, IL16 and PTPN1.

Gene expression data were limited to these 21 genes available from the original study’s datasets (S1 & S2), representing 2 categories. The first category is the “big three” oncogenic drivers of MYC, BCL2 and BCL6, serving as strong prognostic markers. Moreover, MYC21 and BCL613 have both shown age-dependent prognostic effects, suggesting age may modify how these markers influence clinical outcomes. The second category comprises 19 cell-of-origin classifier genes (DLBCL subtypes: ABC vs GCB)22, with BCL6 appearing in both categories. Despite their initial utility for classification, genes such as CCND2 and LMO223 have shown to be strong survival predictors alongside BCL2 and BCL6. More importantly, IRF4 is among the genetic features showing age associations and contributes to genetic complexity that loses prognostic significance when age is incorporated13, alongside BCL6. Both categories therefore contain genes with potential to interact with demographic factors, making them suitable for testing whether demographic factors modify their prognostic associations.

Overall, data completeness for these 27 columns was exceptional, with 23 columns being 100%, and the remaining 4 columns all with more than 93% completeness. Processing is documented in 02-data-processing.R. Characteristics of the cohort are summarised24 in Table 1.

Table 1. Baseline Cohort Characteristics

Characteristic Total (N=775) Complete Response Incomplete Response
Sample size N = 775 n = 598 n = 126
Gender
Female 338 (43.6%) 258 (43.1%) 54 (42.9%)
Male 437 (56.4%) 340 (56.9%) 72 (57.1%)
Age at diagnosis, years 61 ± 15.4 59.8 ± 15.7 63.1 ± 14.3
Age groups (NCCN-IPI)
≤40 years 75 (10.1%) 67 (11.5%) 8 (6.5%)
41-60 years 249 (33.5%) 202 (34.8%) 41 (33.1%)
61-75 years 285 (38.3%) 218 (37.5%) 47 (37.9%)
>75 years 135 (18.1%) 94 (16.2%) 28 (22.6%)

Demographic and clinical characteristics stratified by treatment response.
Values shown as n (%) for categorical variables and mean ± SD for continuous variables.
Treatment response categories combine partial and no response as “Incomplete response” versus complete response.
NCCN-IPI age groups represent National Comprehensive Cancer Network International Prognostic Index classifications.
Percentages are calculated within each response group for gender and age group distributions.

Exploratory Statistics

Baseline gene expression patterns across demographic groups were examined using t-test (gender) and one-way Welch’s ANOVA25 (age groups) with false discovery rate (FDR)26 correction due to the number of genes analysed (see 04-data-analysis.R). Post-hoc Games-Howell tests were conducted for significant ANOVA results27. Assumptions for these tests were checked with diagnostic tests and plots, and the overall distribution of gene expression against age and gender visually examined (see 03-data-exploration.R). Second, associations between therapy response completeness and demographic factors were examined with the Chi-squared tests, with expected frequencies checked to be \(> 5\) (see 05-response-analysis.R).

Interaction Effects

The baseline information from Exploratory Statistics only describes demographic-associated patterns, but not whether demographic factors modify the prognostic associations of gene expression. To systematically isolate and analyse these prognostic associations of gene expression, demographic factors and the interaction between the two, hierarchical logistic regression models28 were fitted for all 21 genes and 2 demographic factors (42 combinations). Each combination had four logistic regression models fitted to identify significant terms that contribute to therapy response completeness, the models were nested as:

  1. Null model: therapy response ~ 1
  2. Gene-only model: therapy response ~ gene expression
  3. Main effect model: therapy response ~ gene expression + demographic factors (age or gender)
  4. Interaction model: therapy response ~ gene expression + demographic factors (age or gender) + gene-demographic interactions

All fitted models were analysed with likelihood-ratio test (LRT) to obtain FDR-corrected p-values. Significant models were then checked for model fit with goodness-of-fit with Chi-square tests and Akaike Information Criterion (AIC) scores, along with influential observations and collinearity29 (see 06-effect-analysis.R.)

Results

Exploratory Associations

Baseline Gene Expression Patterns by Demographics

Gene expression levels showed no significant differences across gender groups after t-test with FDR correction. Ten genes showed significantly different expression levels after Welch’s ANOVA with FDR correction across NCCN-IPI age groups: BCL2 (p = 0.0002), BLNK (p = 0.003), MYBL1 (p = 0.010), SH3BP5 (p = 0.016), ITPKB (p = 0.020), CCND2 (p = 0.020), PTPN1 (p = 0.031), BMF (p = 0.034), FUT8 (p = 0.037), LMO2 (p = 0.038). Post-hoc analysis with Games-Howell revealed that age groups mostly clustered into two statistically distinct groups for each gene, save for BLNK with three groups Figure 132.

Figure 1. Gene Expression Patterns Across Age Groups.

Fig. 1
Box plots show gene expression distribution by age group for genes with significant Welch’s ANOVA results (FDR-adjusted p < 0.05).
Compact letter display (CLD) letters (a, b, c) above each boxplot indicate statistical groupings from Games-Howell post-hoc tests within each gene; groups with different letters differ significantly (p < 0.05).
Sample sizes (n) are shown below each age group.
Outliers are highlighted in darker borders.

Therapy Response by Demographics

Neither gender (p = 1.000) nor age group (p = 0.173) showed significant associations with complete response, although incomplete response proportions did trend upwards as age increased. While neither demographic factor showed prognostic effect on treatment response, the key question whether they modify the gene expression-therapy response association is addressed in the section below.

Gene Expression & Demographic Interactions in Predicting Response

Gene-only Effects

As part of fitting hierarchical models for each gene with either age or gender groups, gene-only models predicting therapy response (i.e. no demographic modification) were also fitted (Figure 2). One gene out of 21, MYC, displayed significant gene-only effects (LRT FDR-adjusted p-value = 0.010, AIC = 647.57, Chi-square goodness-of-fit p-value = 0.947), with higher expression predicting lower complete response probability. With the lowest AIC score among the 8 MYC models, adding demographic or demographic-MYC interaction did not improve model fit, indicating MYC’s prognostic effect is consistent across age and gender groups (Figure 3a).

Figure 2. Gene-only Effects on Complete Response

Fig. 2
Orange: Adjusted significant
Blue: Adjusted insignificant
Lilac: Insignificant
Confidence interval: 95%
MYC is the only gene showing significant gene effect after FDR correction, being well clear of the 1.0 reference (no effect) reference line even when considering its CI error bars.

Demographic Modification Effects

Systematic testing of 21 genes across both age and gender yielded 42 gene-demographic models, of which no gender model showed significant effect. Neither gene-gender interaction nor gender main effect models attained significant results for any gene after FDR adjustment. For age, only the CCND2 gene showed significant (LRT FDR-adjusted p-value = 0.006, AIC = 643.12, Chi-square goodness-of-fit p-value = 0.973) effects amongst the interaction models of all 21 genes, and the rest 20 genes showed neither main (additive) nor interactive effects with age. Having the lowest AIC score (min delta = 12.7) amongst the 4 CCND2 models indicate that response completeness is best predicted with gene expression when its interaction with age is factored in.

Clear age-dependent directional effects emerge for CCND2 (Figure 3b), with higher CCND2 expression levels affecting the response completeness of the age groups differently. The curves for age groups ≤40 and 41-60 and the curves for age groups 61-75 and >75 are going in opposite directions. Complete response probability decreases with CCND2 expression level for the 2 younger groups, while complete response probability increases with CCND2 expression level for the 2 older groups. Further scrutinising the odds ratio33 of the 4 age groups (Figure 4) confirms that both older age groups showed significantly higher complete response odds (61-75 p = 0.014; >75 p = 0.001), and younger age groups (≤40, 41-60) have trends for worse but statistically insignificant odds.

Figure 3. Logistic Regression Models - Gene Expression Effects on Complete Response

Fig. 3
A. Logistic regression plot of MYC gene expression against complete response probability, age-interaction model
B. Logistic regression plot of CCND2 gene expression against complete response probability, age-interaction model
The MYC gene showed no significant age-interaction effects, contrasting the curves of the different age groups for CCND2.

Figure 4. CCND2 Interaction Effects - Age Group Modifying Gene Expression Effects on Complete Response

Fig. 4
Odds ratio of age groups
Orange: Significant
Lilac: Insignificant
Confidence interval: 95%
Both older age groups showed significant modifying effect.

Discussion

This systematic analysis tested whether demographic factors modify gene expression associations with treatment response across 21 genes and 2 demographic factors (42 total combinations). Only CCND2 showed significant age interaction effects, while MYC showed significant gene-only effects. Gender revealed neither main nor interaction effects for any gene. The rarity (1 out of 42 combinations) suggests demographic stratification could be generally unneeded for most genetic markers, with CCND2 representing an exception pending validation in other cohorts.

CCND2’s age modifying effects significantly associate higher expression with better outcomes in older patients (61-75, >75) but younger patients have an opposite yet statistically insignificant trend. This represents a novel finding with no precedent in literature. While it is established that CCND2 is invovled in the cell cycle and is also a target for the MYC gene34, no prior studies have demonstrated age-dependent prognostic association, and the underlying biological mechanism remains unclear and requires investigation. This finding also provides a possible opening to reconcile conflicting CCND2 literature, where studies have reported both positive35 and negative37 prognoses associations. These differences could be observing genuinely different effects rather than contradicting results. A reanalysis of these datasets while factoring age effects could be good first step to validate this premise.

MYC’s consistent effects across all demographic groups validates its established oncogene status38 and its role as a universal prognostic marker, contrasting CCND2’s age-dependence. The other two of the “big three” oncogenes BCL2 and BCL6 only showed marginal gene-only effects that lost significance after FDR correction.

The lack of gender effects aligns with the findings of NCCN-IPI’s authors, where they found no substantial enhancement to their model when including gender4, and further validates gender’s exclusion from the index.

While the adequate sample size \((n=775)\) allowed for systematic testing of 42 combinations, several limitations warrant consideration. As a study on demographic effects, ethnicity information was absent, despite probable international patient involvement. Gene expression data was limited to 21 genes pre-selected for publication, missing other potential genes and interactions. It would also be of great boon to the study to be able to benchmark the analysis here with Reddy’s genomic risk model. Unfortunately, the (hyper-)parameters required to rebuild such model along with the interactive webtool for this model are no longer accessible. This highlights the need for reproducibility in cancer research39 and bioinformatics40 in general, and make me appreciate the emphasis of robust replicability our lecturers had imparted on us during the first module.

Future opportunites include validating CCND2’s age interaction effects in other cohorts, and reanalysing studies with conflicting CCND2 prognosis associations with age stratification. After validation in different cohorts, perhaps mechanistic studies could investigate the biology behind this age interaction effect, alongside broader screening for demographic interactions across potential genomic markers.

Conclusion

This analysis of 775 DLBCL patients discovered 10 genes BCL2, BLNK, MYBL1, SH3BP5, ITPKB, CCND2, PTPN1, BMF, FUT8 and LMO2, showed age-dependent gene expression levels. One gene CCND2 showed modifying effects with age on predicting therapy outcomes, and the MYC gene showed gene only effects with age. The discovery of CCND2’s modifying effect could partially resolve the conflicting accounts of how CCND2 activity associate with prognosis. This is an exciting find that still needs work to replicate and validate.

References

1.
The Non-Hodgkin’s Lymphoma Classification Project. A clinical evaluation of the international lymphoma study group classification of non-hodgkin’s lymphoma. Blood 89, 3909–3918 (1997).
2.
Wang, S. S. Epidemiology and etiology of diffuse large b-cell lymphoma. Seminars in Hematology 60, 255–266 (2023).
3.
Shipp, M. A. et al. A predictive model for aggressive non-hodgkin’s lymphoma. New England Journal of Medicine 329, 987–994 (1993).
4.
5.
6.
7.
8.
Habermann, T. M. Is rituximab one for all ages and each sex? Blood 123, 602–603 (2014).
9.
10.
11.
Chenevix-Trench, G., Behm, F. G. & Westin, E. H. Somatic rearrangement of the c-myc oncogene in primary human diffuse large-cell lymphoma. International Journal of Cancer 38, 513–516 (1986).
12.
Petrich, A. M., Nabhan, C. & Smith, S. M. MYC-associated and double-hit lymphomas: A review of pathobiology, prognosis, and therapeutic approaches. Cancer 120, 3884–3895 (2014).
13.
14.
Reddy, A. et al. Genetic and functional drivers of diffuse large b cell lymphoma. Cell 171, 481–494.e15 (2017).
15.
Reddy, A. et al. Supplementary data for "genetic and functional drivers of diffuse large b cell lymphoma": Table S1 (clinical information and genetic alteration data for 1,001 DLBCL samples) and table S2 (ABC/GCB classification using gene expression data). (2017).
16.
R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, Vienna, Austria, 2025).
17.
Wickham, H. & Bryan, J. Readxl: Read Excel Files. (2025). doi:10.32614/CRAN.package.readxl.
18.
Wickham, H. et al. Welcome to the tidyverse. Journal of Open Source Software 4, 1686 (2019).
19.
Müller, K. Here: A Simpler Way to Find Your Files. (2025). doi:10.32614/CRAN.package.here.
20.
21.
22.
Wright, G. et al. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large b cell lymphoma. Proceedings of the National Academy of Sciences 100, 9991–9996 (2003).
23.
Lossos, I. S. et al. Prediction of survival in diffuse large-b-cell lymphoma based on the expression of six genes. New England Journal of Medicine 350, 1828–1837 (2004).
24.
Zhu, H. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. (2024). doi:10.32614/CRAN.package.kableExtra.
25.
Kassambara, A. Rstatix: Pipe-Friendly Framework for Basic Statistical Tests. (2025). doi:10.32614/CRAN.package.rstatix.
26.
Reiner, A., Yekutieli, D. & Benjamini, Y. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19, 368–375 (2003).
27.
Rongen, M. van et al. Core statisics. (2025).
28.
Robinson, D., Hayes, A. & Couch, S. Broom: Convert Statistical Objects into Tidy Tibbles. (2025). doi:10.32614/CRAN.package.broom.
29.
Hodgson, V., Castle, M., Nicholls, R. & Rongen, M. van. Generalised linear models. (2025).
30.
31.
Graves, S., Piepho, H.-P. & Sundar Dorai-Raj, L. S. with help from. multcompView: Visualizations of Paired Comparisons. (2024). doi:10.32614/CRAN.package.multcompView.
32.
Neuwirth, E. RColorBrewer: ColorBrewer Palettes. (2022). doi:10.32614/CRAN.package.RColorBrewer.
33.
Lenth, R. V. & Piaskowski, J. Emmeans: Estimated Marginal Means, Aka Least-Squares Means. (2025). doi:10.32614/CRAN.package.emmeans.
34.
35.
Wang, D., Zhang, Y. & Che, Y.-Q. CCND2 mRNA expression is correlated with r-CHOP treatment efficacy and prognosis in patients with ABC-DLBCL. Frontiers in Oncology Volume 10 - 2020, (2020).
36.
37.
38.
Dhanasekaran, R. et al. The MYC oncogene — the grand orchestrator of cancer growth and immune evasion. Nature Reviews Clinical Oncology 19, 23–36 (2022).
39.
Begley, C. G. & Ellis, L. M. Raise standards for preclinical cancer research. Nature 483, 531–533 (2012).
40.
Ziemann, M., Poulain, P. & Bora, A. The five pillars of computational reproducibility: Bioinformatics and beyond. Briefings in Bioinformatics 24, bbad375 (2023).